1,284 research outputs found
Information scraps: how and why information eludes our personal information management tools
In this paper we describe information scraps -- a class of personal information whose content is scribbled on Post-it notes, scrawled on corners of random sheets of paper, buried inside the bodies of e-mail messages sent to ourselves, or typed haphazardly into text files. Information scraps hold our great ideas, sketches, notes, reminders, driving directions, and even our poetry. We define information scraps to be the body of personal information that is held outside of its natural or We have much still to learn about these loose forms of information capture. Why are they so often held outside of our traditional PIM locations and instead on Post-its or in text files? Why must we sometimes go around our traditional PIM applications to hold on to our scraps, such as by e-mailing ourselves? What are information scraps' role in the larger space of personal information management, and what do they uniquely offer that we find so appealing? If these unorganized bits truly indicate the failure of our PIM tools, how might we begin to build better tools? We have pursued these questions by undertaking a study of 27 knowledge workers. In our findings we describe information scraps from several angles: their content, their location, and the factors that lead to their use, which we identify as ease of capture, flexibility of content and organization, and avilability at the time of need. We also consider the personal emotive responses around scrap management. We present a set of design considerations that we have derived from the analysis of our study results. We present our work on an application platform, jourknow, to test some of these design and usability findings
Efficient crowdsourcing for multi-class labeling
Crowdsourcing systems like Amazon's Mechanical Turk have emerged as an effective large-scale human-powered platform for performing tasks in domains such as image classification, data entry, recommendation, and proofreading. Since workers are low-paid (a few cents per task) and tasks performed are monotonous, the answers obtained are noisy and hence unreliable. To obtain reliable estimates, it is essential to utilize appropriate inference algorithms (e.g. Majority voting) coupled with structured redundancy through task assignment. Our goal is to obtain the best possible trade-off between reliability and redundancy. In this paper, we consider a general probabilistic model for noisy observations for crowd-sourcing systems and pose the problem of minimizing the total price (i.e. redundancy) that must be paid to achieve a target overall reliability. Concretely, we show that it is possible to obtain an answer to each task correctly with probability 1-ε as long as the redundancy per task is O((K/q) log (K/ε)), where each task can have any of the distinct answers equally likely, q is the crowd-quality parameter that is defined through a probabilistic model. Further, effectively this is the best possible redundancy-accuracy trade-off any system design can achieve. Such a single-parameter crisp characterization of the (order-)optimal trade-off between redundancy and reliability has various useful operational consequences. Further, we analyze the robustness of our approach in the presence of adversarial workers and provide a bound on their influence on the redundancy-accuracy trade-off.
Unlike recent prior work [GKM11, KOS11, KOS11], our result applies to non-binary (i.e. K>2) tasks. In effect, we utilize algorithms for binary tasks (with inhomogeneous error model unlike that in [GKM11, KOS11, KOS11]) as key subroutine to obtain answers for K-ary tasks. Technically, the algorithm is based on low-rank approximation of weighted adjacency matrix for a random regular bipartite graph, weighted according to the answers provided by the workers.National Science Foundation (U.S.
Globally Optimal Crowdsourcing Quality Management
We study crowdsourcing quality management, that is, given worker responses to
a set of tasks, our goal is to jointly estimate the true answers for the tasks,
as well as the quality of the workers. Prior work on this problem relies
primarily on applying Expectation-Maximization (EM) on the underlying maximum
likelihood problem to estimate true answers as well as worker quality.
Unfortunately, EM only provides a locally optimal solution rather than a
globally optimal one. Other solutions to the problem (that do not leverage EM)
fail to provide global optimality guarantees as well. In this paper, we focus
on filtering, where tasks require the evaluation of a yes/no predicate, and
rating, where tasks elicit integer scores from a finite domain. We design
algorithms for finding the global optimal estimates of correct task answers and
worker quality for the underlying maximum likelihood problem, and characterize
the complexity of these algorithms. Our algorithms conceptually consider all
mappings from tasks to true answers (typically a very large number), leveraging
two key ideas to reduce, by several orders of magnitude, the number of mappings
under consideration, while preserving optimality. We also demonstrate that
these algorithms often find more accurate estimates than EM-based algorithms.
This paper makes an important contribution towards understanding the inherent
complexity of globally optimal crowdsourcing quality management
Incentivizing High Quality Crowdwork
We study the causal effects of financial incentives on the quality of
crowdwork. We focus on performance-based payments (PBPs), bonus payments
awarded to workers for producing high quality work. We design and run
randomized behavioral experiments on the popular crowdsourcing platform Amazon
Mechanical Turk with the goal of understanding when, where, and why PBPs help,
identifying properties of the payment, payment structure, and the task itself
that make them most effective. We provide examples of tasks for which PBPs do
improve quality. For such tasks, the effectiveness of PBPs is not too sensitive
to the threshold for quality required to receive the bonus, while the magnitude
of the bonus must be large enough to make the reward salient. We also present
examples of tasks for which PBPs do not improve quality. Our results suggest
that for PBPs to improve quality, the task must be effort-responsive: the task
must allow workers to produce higher quality work by exerting more effort. We
also give a simple method to determine if a task is effort-responsive a priori.
Furthermore, our experiments suggest that all payments on Mechanical Turk are,
to some degree, implicitly performance-based in that workers believe their work
may be rejected if their performance is sufficiently poor. Finally, we propose
a new model of worker behavior that extends the standard principal-agent model
from economics to include a worker's subjective beliefs about his likelihood of
being paid, and show that the predictions of this model are in line with our
experimental findings. This model may be useful as a foundation for theoretical
studies of incentives in crowdsourcing markets.Comment: This is a preprint of an Article accepted for publication in WWW
\c{opyright} 2015 International World Wide Web Conference Committe
Almost-Tight Distributed Minimum Cut Algorithms
We study the problem of computing the minimum cut in a weighted distributed
message-passing networks (the CONGEST model). Let be the minimum cut,
be the number of nodes in the network, and be the network diameter. Our
algorithm can compute exactly in time. To the best of our knowledge, this is the first paper that
explicitly studies computing the exact minimum cut in the distributed setting.
Previously, non-trivial sublinear time algorithms for this problem are known
only for unweighted graphs when due to Pritchard and
Thurimella's -time and -time algorithms for
computing -edge-connected and -edge-connected components.
By using the edge sampling technique of Karger's, we can convert this
algorithm into a -approximation -time algorithm for any . This improves
over the previous -approximation -time algorithm and
-approximation -time algorithm of Ghaffari and Kuhn. Due to the lower
bound of by Das Sarma et al. which holds for any
approximation algorithm, this running time is tight up to a factor.
To get the stated running time, we developed an approximation algorithm which
combines the ideas of Thorup's algorithm and Matula's contraction algorithm. It
saves an factor as compared to applying Thorup's tree
packing theorem directly. Then, we combine Kutten and Peleg's tree partitioning
algorithm and Karger's dynamic programming to achieve an efficient distributed
algorithm that finds the minimum cut when we are given a spanning tree that
crosses the minimum cut exactly once
Quantum effect induced reverse kinetic molecular sieving in microporous materials
We report kinetic molecular sieving of hydrogen and deuterium in zeolite rho at low temperatures, using atomistic molecular dynamics simulations incorporating quantum effects via the Feynman-Hibbs approach. We find that diffusivities of confined molecules decrease when quantum effects are considered, in contrast with bulk fluids which show an increase. Indeed, at low temperatures, a reverse kinetic sieving effect is demonstrated in which the heavier isotope, deuterium, diffuses faster than hydrogen. At 65 K, the flux selectivity is as high as 46, indicating a good potential for isotope separation
Tunable Electron Multibunch Production in Plasma Wakefield Accelerators
Synchronized, independently tunable and focused J-class laser pulses are
used to release multiple electron populations via photo-ionization inside an
electron-beam driven plasma wave. By varying the laser foci in the laboratory
frame and the position of the underdense photocathodes in the co-moving frame,
the delays between the produced bunches and their energies are adjusted. The
resulting multibunches have ultra-high quality and brightness, allowing for
hitherto impossible bunch configurations such as spatially overlapping bunch
populations with strictly separated energies, which opens up a new regime for
light sources such as free-electron-lasers
Climatologies at high resolution for the earth's land surface areas
High resolution information of climatic conditions is essential to many application in environmental sciences. Here we present the CHELSA algorithm to downscale temperature and precipitation estimates from the European Centre for Medium-Range Weather Forecast (ECMWF) climatic reanalysis interim (ERA-Interim) to a high resolution of 30 arc sec. The algorithm for temperature is based on a statistical downscaling of atmospheric temperature from the ERA-Interim climatic reanalysis. The precipitation algorithm incorporates orographic predictors such as wind fields, valley exposition, and boundary layer height, and a bias correction using Global Precipitation Climatology Center (GPCC) gridded and Global Historical Climate Network (GHCN) station data. The resulting data consist of a monthly temperature and precipitation climatology for the years 1979-2013. We present a comparison of data derived from the CHELSA algorithm with two other high resolution gridded products with overlapping temporal resolution (Tropical Rain Measuring Mission (TRMM) for precipitation, Moderate Resolution Imaging Spectroradiometer (MODIS) for temperature) and station data from the Global Historical Climate Network (GHCN). We show that the climatological data from CHELSA has a similar accuracy to other products for temperature, but that the predictions of orographic precipitation patterns are both better and at a high spatial resolution
A -Vertex Kernel for Maximum Internal Spanning Tree
We consider the parameterized version of the maximum internal spanning tree
problem, which, given an -vertex graph and a parameter , asks for a
spanning tree with at least internal vertices. Fomin et al. [J. Comput.
System Sci., 79:1-6] crafted a very ingenious reduction rule, and showed that a
simple application of this rule is sufficient to yield a -vertex kernel.
Here we propose a novel way to use the same reduction rule, resulting in an
improved -vertex kernel. Our algorithm applies first a greedy procedure
consisting of a sequence of local exchange operations, which ends with a
local-optimal spanning tree, and then uses this special tree to find a
reducible structure. As a corollary of our kernel, we obtain a deterministic
algorithm for the problem running in time
- …